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ADDING IMPERCEPTIBLE NOISE TO AUDIO AND OTHER TYPES OF 
SIGNALS TO CAUSE SIGNIFICANT DEGRADATION WHEN 
5 COMPRESSED AND DECOMPRESSED 



CROSS-REFERENCE TO RELATED APPLICATIONS 
This is a continuation-in-part of co-pending patent application serial 
no. 09/667,345, filed September 22, 2000, which in turn is a continuation-in-part of 
co-pending patent application serial no. 09/570,655, filed May 15, 2000. This is also 
related to patent application serial no. 09/484,851, filed January 18, 2000, and its 
continuation-in-part application serial no. 09/584,134, filed May 31, 2000, 
hereinafter referred to as the "Secure Transmission Patent Applications." These four 
applications are expressly incorporated herein by this reference. 

BACKGROUND OF THE INVENTION 
This invention is related to the processing, transmission and recording 
of signals intended for interfacing with humans, particularly music and other audio 
signals, and, more specifically, to techniques that prevent or discourage the 
unauthorized copying and/or distribution of audio or other content of such signals. 

The ease that music can be electronically distributed by private 
individuals over the Internet is causing great concern on the part of the music content 
providers, their distributors and retailers. It is now possible for one compact disc to 
be purchased and, in a matter of hours, electronically distributed by the purchaser 
without charge to his or her friends, and even to people or enterprises unknown to the 
purchaser. Clearly, this reduces the desire of many to pay for the music and causes 
great concern on the part of the recording industry that their revenues and profits are 
being significantly eroded. Record labels are reacting by employing all legal means 
to prevent this unauthorized copying and distribution, and by fostering the 
development of technological means to make this unprecedented delivery of free 
audio entertainment significantly more difficult or impossible. 

What makes this electronic sharing of music over the Internet practical 
is the availability of high caliber audio compression algorithms. These algorithms are 
capable of reducing the data rates and data volumes, previously required to digitally 

1 
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represent music, by a factor of more than 10, while maintaining acceptable audio 
quality. The provider compresses the music data by such a factor and the recipient 
then applies a mating decompression algorithm to the received compressed data to 
recover something close to the original music. MP3 (MPEG 1 Layer 3) and AAC 
5 (Advanced Audio Coding) are examples of commonly used compression algorithms 
that offer this capability. DTS (Digital Theater Systems) and AC-3 compression 
algorithms are professionally used for movie sound tracks and the like. A common 
characteristic of these compression algorithms is that data of frequencies not 
separately resolvable by the human ear are discarded, thereby to reduce the amount of 

1 0 data necessary to be transmitted. 

Psychoacoustic audio compression technologies, such as MP3 and " 
AAC, operate by making quantized noise imperceptible to the human hearing system. 
In digital audio systems, such as those used by compact disks to deliver music to 
consumers, 16 bit resolution is considered to be about the practical minimum number 

15 of bits to use to keep the quantized noise down to an acceptable level (in this case 

about 966B below the maximum signal level). The objective of an audio compression 
algorithm is to use as few a bits as possible to represent the input audio signal. In 
order to use fewer bits, mechanisms need to be found to minimize the increased level 
of quantized noise, or make this higher level of noise indiscernible to the listener. 

20 The characteristics of the human hearing process provides several opportunities to do 
the latter. The first is me basic threshold of hearing. Human ears tend to be less 
sensitive at low and high frequencies. The second characteristic can be seen by 
considering the structure of the inner ear. The cochlea is a spiral, tapering passage 
with the basilar membrane that is stretched, more or less, across the diameter along its 

25 length. Sound is conducted from the outer ear to the fluid in the cochlea where it 
travels the length of the basilar membrane. Different frequency components of a 
sound vibrate the hair cells at different locations along the membrane, stimulating the 
auditory nerves. The frequency dependent movement of the hair cells make the ear 
act like a spectrum analyzer. A high level frequency component will not only vibrate 

30 the hair cells at the location sensitive to that specific frequency, but it will also vibrate 
the hair cells at some of the adjacent locations as well. The spreading of the response 
to a specific frequency over multiple hair cell sensors can and will override, or 
"mask", the response to other lower level, nearby frequency components. The ability 
of relatively loud sounds to mask lower level ones is usually described by sets of 
2 
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frequency and level-dependent "masking curves". If the quantizing noise produced by 
a coarse quantizer can be confined to the spectral region near to the signal component 
being quantized (or encoded), and if that noise is low enough to fall below the 
masking curve of the signal being coded, then the listener will not hear the quantized 
5 noise. That is, the amount of data that represent spectral regions near to the signal 
component being quantized can be reduced without it becoming noticeable to the 
listener. 

What is needed is a means to permit this technology to serve the 
recording industry's need for revenue and profits, by allowing Electronic Music 
10 Distribution ("EMD") to be used as another channel of distributing and collecting 
revenue for music product, while simultaneously preventing this same technology 
from negatively impacting the industry. The present invention is directed in large part 
to satisfying this need. 

15 SUMMARY OF THE INVENTION 

Briefly and generally, an electronic signal that is perceptible to the 
senses of a human, such as an audio or video signal, is modified in a manner that is 
not perceptible until, after the signal is compressed and decompressed, the 
decompressed signal is noticeably degraded. The specific embodiments and examples 

20 provided herein relate primarily to the processing of audio signals but the principles 
used with audio signals also apply to other types of observed signals, such as video 
signals. 

An audio signal is modified in a manner that is not perceptible to the 
human ear until, after compression according to one of various specific compression 

25 algorithms, an uncompressed version of the compressed signal is noticeably distorted 
to the human ear. The audio signal may be modified an amount that a small 
degradation is perceived by a limited number of trained observers but generally not 
noticed by ordinary listeners. It is the imperceptibility to ordinary listeners that is 
important, of course, not the perception of a relatively few number of audio experts. 

30 A subsequent compression and decompression of the modified signal then results in a 
reproduction of it that is perceived by ordinary listeners, as well as audio experts, to 
be significantly degraded. The original audio signal is modified so that its subsequent 
compression and decompression changes it from one that is acceptable to almost all 
listeners to one that is not acceptable to those same listeners. The perceptibility of the 
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signal modifications can also be determined electronically by comparing the original 
and the modified signals with data of masking characteristics of the human ear that 
are in common use in sound signal processing, particularly as part of audio 
compression and decompresssion techniques. 
5 In a first embodiment, the original audio signal is so modified, so that 

any such compression and decompression results in the distorted signal. In a second 
embodiment, a compressed audio signal is modified in a manner that provides a high 
quality signal when decompressed but which, when that decompressed signal is again 
compressed, its further decompression results in a noticeably distorted signal. The 

1 0 effect of providing a sound signal that cannot be compressed without such 

degradation of quality limits its distribution over the Internet since it is not currently 
practical to distribute uncompressed sound signal files over the Internet. The time 
taken to transmit uncompressed files and the computer storage space necessary to 
hold them are far too large for the usual Internet user. Therefore, illegal distribution 

15 of music over the Internet will be significantly reduced. Sales by music providers will 
be maintained. 

In a first example of the first embodiment of the present invention, an 
audio signal is modified by increasing levels of its masked frequency components 
while still retaining those levels below the masking level of a typical human ear. The 

20 resulting distortion caused by this "anti-compression" processing of the signal is thus 
not heard by a listener. But when the modified audio signal is compressed and then 
decompressed by algorithms of the type discussed above, the resulting sound is 
significantly degraded in quality. This is because the compression algorithm is 
operating on a different sound signal than the original one that is desired to be 

25 reproduced. As a result, the masking levels are different and the reduced number of 
bits used to represent the spectrum are thus allocated differently. When these 
different bit allocations are used to reconstruct the sound signal, it does not represent 
the original signal. Indeed, the compression algorithm may need to allocate a limited 
number of bits to an expanded portion of the signal's spectrum, thus not representing 

30 the unmasked, audible portions with enough resolution. The resulting decompressed 
sound signal is a significantly degraded, noisy version of the original signal and is 
therefore not desirable for listening. 

In a second example of the first embodiment of the anti-compression 
techniques, relationships between multiple audio data channels are used. The example 
4 
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of this embodiment employs the alteration of timing and or phase relationships found 
within an audio signal with two or more channels. Alteration of these relationships in 
a multi-channel signal causes subsequent compression and decompression processes 
to incorrectly combine the multiple channel data during the data reduction process, 
5 and thus cause a degraded version of the original audio signal to be produced after the 
compression process is complete. 

A third example of the first embodiment of anti-compression 
techniques again uses relationships between multiple audio data channels. In this 
case, data from one channel of a multi-channel signal is added to the data of another 

10 channel of the multi-channel signal in a manner such that the donor signal is masked 
by the receiver signal. This data is altered in phase on a periodic or aperiodic basis 
and can also be altered in phase on a frequency component basis. The effect is to 
once again cause a subsequent compression and decompression process, which 
attempts to combine the data in the multiple channels as a strategy to reduce data rate, 

1 5 to incorrectly perform this combination process and thus cause the resulting 
compressed signal to be degraded when decompressed. 

A fourth example of the first anti-compression embodiment once again 
uses the relationships between multiple audio data channels, but in this case they are 
used to unmask data embedded into the original signal that are masked by the audio 

20 data prior to the compression process being performed. 

In a fifth example of the first anti-compression embodiment, it is noted 
that the mechanisms employed to reduce the data rate of monophonic and multi- 
channel signals often employ detectors which monitor input audio signals, partial 
results being available during the encoding process and/or included with the encoded 

25 output signal characteristics. The results of this monitoring activity are used to 

initiate different compression processing modes. These different modes are initiated 
in order to encode special case audio signals with fewer artifacts. The selection 
mechanisms driven by these detectors can and do make the wrong choices when 
encountering unanticipated changes in audio signal characteristics. When this occurs, 

30 an incorrect set of processing functions are employed to encode the incoming audio 
signal and the resulting encoded output signal does not accurately reflect the 
properties of the input signal. This fifth example of the first anti-compression 
embodiment takes advantage of this fact by placing phase, timing and/or amplitude 
discontinuities in the original signal, which are masked by the audio signal itself. 
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These discontinuities cause the aforementioned detectors to switch to an incorrect 
mode with respect to the audio signal being processed, thus choosing an inappropriate 
processing function for the audio signal being encoded. Thus, when the encoded 
audio signal is decompressed, a compromised quality audio output is realized. These 
5 discontinuities can be monophonic in nature, in that a mode detector's confusion can 
be caused by discontinuities injected into only one channel of the data stream that are 
independently analyzed with respect to activity in other audio channels. They can 
also be multi-channel in nature, in that a mode detector's confusion can be caused by 
injected discontinuities which are analyzed in relationship to activity in one or more 

10 of the other audio channels . 

In a second embodiment of the present invention, an encode/decode 
compression algorithm pair is described which has the characteristic of producing 
compressed audio data that can be decompressed for listening, but cannot be 
compressed with quality for a second time, thus effectively disallowing 

15 retransmission of the audio data over the Internet. A first example of this "one 
generation" codec with built in anti-compression processing, uses the addition of 
noise or other data to achieve the desired unique results. 

A second example of the second embodiment employs the generational 
characteristics of compression algorithms to a similar end. 

20 A third example of the one generation codec embodiment of the 

present invention uses the fact that compression algorithms with improved 
generational qualities often use additional techniques to reduce bit requirements 
without adding quantization noise. These techniques, Huffman encoding for example, 
form the basis of additional methods for producing compressed audio data that can be 

25 decompressed for listening, but cannot be compressed with quality for a second time. 
The unique concept, presented in this third example of the one generation codec, of 
embedding data within a compressed audio signal that is decoded by a subsequent 
decoding process as if it was part of the originally encoded data, and which is in a 
form that is compatible with the compressed audio data which comprises said 

30 compressed audio data stream, may be included as a central idea in all the examples 
of the second embodiment of the present invention. 

In a fourth example of the one generation codec embodiment of the 
present invention, an alteration of the timing of the processing of defined blocks of 
audio data is employed to create a compressed version of the original audio data that 
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displays high quality when decompressed and listened to, but will cause following 
compression and decompression processes to be unable to choose the size and process 
timing necessary to mask, transient noise added to the audio data during the initial 
compression process. 

5 In a fifth example of the one generation codec embodiment, phase, 

timing and/or amplitude discontinuities are inserted into one or more of the channels 
of the encoded audio. These discontinuities are designed to be as imperceptible to the 
human ear as possible when they appear in the decompressed audio. However, they 
are tailored to cause the initiation of different compression processing modes in a 

10 subsequent encoding (compression) process, as described in the fifth example of the 
first anti-compression embodiment of this invention. The incorporation of these 
discontinuities in the codec allows for the discontinuities to be embedded in the 
encoded signal at the time of encoding, or the passing of discontinuity information 
from the encoder to the decoder by means of carrying the additional discontinuity data 

1 5 along with the encoded data stream in the data structure of the encoded signal. In the 
former case, discontinuities are added to the encoded, compressed audio data itself 
such that the decompression decoder will pass these discontinuities into the 
decompressed data stream without acting upon them, and thus these discontinuities 
will appear in the decompressed data stream with minimal or no alteration. In the 

20 latter case, the mixing of the discontinuities with the decoded data stream takes place 
in the decoder. This has two potential benefits. The first is to permit the original, 
unprocessed encoded data stream, to be recovered, if this should be desired. The 
second is to make it possible to convert existing multi-generational codecs, such as 
AAC and MP3, into single generation codecs, without the need to change the inner 

25 processing structure of these codecs. This is because the discontinuity data can be 
added to the decompressed signal after decoding. It should be noted that all 
previously described one generation codec examples can be implemented in this 
manner. It should also be noted that a decoder can be constructed such that the 
discontinuity data is generated within the decoder, with no discontinuity information 

30 passed to the decoder from the encoder. This discontinuity information is then 
derived from analysis of the signal characteristics of the decoded audio signal and 
mixed with the decoded audio signal before it is delivered to the user as a time 
domain audio output. 



WO 01/88915 



PCT/US01/15328 



A unique method of adaptively optimizing anti-compression 



processing of audio data is also included as part of the present invention. For 
example, any of the foregoing processing techniques can be adjusted as a function of 
characteristics of the input audio signal being processed during such processing. 



algorithms to reduce the amount of audio signal data while mamtaining quality, the 
10 techniques of the present invention apply those principles to change the character of 
the sound signal so that it cannot be compressed without significant degradation in the 
quality of the signal. Indeed, existing compression algorithms have been designed to 
allow a signal to be compressed and decompressed two or more times without 
significant degradation of the quality of the signal that is perceptible to the human ear, 
15 termed their "generational" quality. But the present invention uses the principles of 
compression in a reverse manner, modifying a sound signal so that it will not retain its 
quality when compressed. This contrary use of the principles underlying compression 
algorithms greatly improves the ability of a music provider to control the distribution 
of its music. 

20 Additional features, advantages and objects of the present invention are 

included in the following description of its embodiments, which description should be 
taken in conjunction with the accompanying drawings. 



5 



Finally, a unique concept is included that discourages, and makes it 
difficult for computer hackers compromise the beneficial effects of the audio 
processing begin disclosed. 

In general, rather than using the principles underlying compression 



25 



BRIEF DESCRIPTION OF THE DRAWINGS 
Figure 1 illustrates the processing of an audio signal according to the 



present invention; 



Figure 2 is a curve representing an audio signal being processed; 
Figure 3 is an example frequency spectra for a block of the audio 



30 



signal that shows its processing according to the present invention; 

Figure 4 shows an example frequency spectra for a block of the audio 
signal after it is modified by the processing of the present invention; 



Figure 5 illustrates a recording application of the present invention; 
Figure 6 illustrates an Internet music delivery application of the present 
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Figure 7 shows a key card for use in the delivery application of Figure 

6; 

Figure 8 illustrates a one generation codec with built-in anti- 
compression components as part of the compression process; 
5 Figure 9 illustrates the application of "adaptive processing", referred to 

as optimization, to maximize the difference between the high quality of a processed 
but not compressed audio signal as compared with the reduced quality of a processed 
and compressed audio signal; 

Figure 10 shows a multi-channel audio compression encoding 
1 0 technique with which various aspects of the present invention may be used; 

Figure 1 1 illustrates a method of adding discontinuities to multi- 
channel audio signals; 

Figure 12 shows example frequency and phase characteristics of two 
channel audio anti-compression niters of Figure 11; 
15 Figure 13 provides example two-channel audio signal characteristics 

and resulting compression algorithm encoding modes; 

Figure 14 includes waveforms before and after an example anti- 
compression processing according to an example of the present invention; 

Figure 15 illustrates anti-compression processing according to an 
20 example of the present invention; and 

Figure 16 is a block diagram showing a single ended one-generation 
encoding technique according to the present invention. 

DESCRIPTION OF EXEMPLARY EMBODIMENTS 

25 

First Embodiment: Audio Signal Anti-compression Examples 

The block diagram of Figure 1 shows an example anti-compression 
signal modification system 511 of the first embodiment of the present invention, 
which operates to process an input audio signal 513. The first three processing steps 
30 515, 517 and 5 1 9 are substantially the same as those of a compression algorithm of 
the type discussed above. In the step 515, a block of data of the signal 513 is 
acquired. Referring to Figure 2, a portion 527 of the signal is shown divided into time 
successive blocks, such as blocks 529 and 531. Preferably in a digital format, data 
representing samples of the signal 527 during a block are quantized in the step 515. 
9 
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The signal block is then filtered in a step 517 in order to obtain floating point 
coefficients of the frequency spectrum of the block of data. Each sampled frequency 
is expressed as an exponent (coarse measure) and mantissa (fine). Those values are 
then used by a non-linear quantizer 519 to calculate a masking function 535 (Figure 3) 
5 and compare it to the spectrum 533 of the block. When used as part of a compression 
algorithm, the quantizer 519 also allocates a lesser number of bits than in the 
incoming signal 513 to represent the signal in limited frequency ranges 537 where the 
spectrum 533 is greater than the mask 535. The remaining frequency ranges are not 
necessary to be included in the compressed signal since they are below the levels 

10 indicated by the mask 535 that a human ear can hear. So they can be omitted, and it 
is this omission that allows the amount of data representing the signal to be reduced. 

But since, in the technique being described, the input signal is not 
being compressed, the bit allocations for the limited frequency ranges 537 need not be 
calculated. Rather, a step 521 is added that does not exist in compression algorithms. 

15 This step calculates increases that can be made to various frequency components of 
the incoming signal 513. The block spectrum 533 and mask 535 calculated in the 
non-linear quantizer 519 are used in this calculation. This calculation increases the 
value of frequency components that are less than the mask 535, increasing the signal 
spectrum 533 into shaded regions 539 of Figure 3. Since, as expressed by the 

20 masking function, the human ear cannot separately resolve these frequencies, this will 
not be perceived to degrade the signal, so long as the spectrum 533 is not increased 
above the level of the mask 535. Indeed, it is preferable to maintain the spectrum 533 
below the mask 535 by some margin in the regions 539 to assure that these added 
signal components will not be heard by the human ear. Example margins are ten or 

25 twenty percent of the level of the masking function 535. 

Furthermore, all frequencies in the regions 539 need not be raised 
above the levels of the curve 533. The spectrum 533 needs to be altered only enough 
to result in a subsequent application of a compression and decompression algorithm to 
the modified signal to cause undesirable perceptible distortions of the original signal 

30 513. 

And, as a further feature, the level of some frequency components of 
the signal 533 maybe increased above the mask 535 without affecting the quality of 
the sound to the human ear, such as at frequencies adjacent peak frequency levels of 
the spectrum. This type of change to the signal 533 can also affect the ability of a 
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decompression algorithm operating on a compressed version of the altered signal to 
provide a good quality decompressed signal. 

Alternatively, changes to the spectrum 533 may he more modest so 
that the modified signal can be subject to one compression and decompression cycle 
5 without significantly degrading the quality of the incoming signal 5 13 but would 
result in serious degradation if again compressed and decompressed. This partial 
degradation has application to the Internet, wherein the partially degraded signal is 
initially sent over the Internet and re-transmissions of the audio signal are discouraged 
when the second or more cycle of compression and decompression makes the sound 

10 undesirable. This application is discussed below with respect to Figure 8. 

In any event, the additional calculated signal is then added to the input 
signal 513 at 523 in order to provide a modified signal output 525. An 
implementation of the processing of Figure 1 includes a digital signal processor that 
operates under controlling software to perform the functions described above. 

1 5 The step 521 may determine in one of several ways the amount that the 

level of the audio signal 513 is to be increased in the step 523 over a portion or all of 
the frequency ranges 53 1 . One way is to generate random or pseudo-random noise 
that is uncorrelated with the signal 513 and add appropriate levels of such noise to the 
signal in the block 523. Another way is to generate a defined signal, such as a sine 

20 wave or a combination of sine waves of different frequencies, that is uncorrelated 
with the audio signal, and then add such a signal(s) to the audio signal. 

A further way to modify the audio signal 513 is to add ah amount of 
signal data that is correlated to it. This last technique may be implemented by simply 
increasing the levels of the frequency components already in the signal that are below 

25 the masking curve 535. This preserves the original audio qualities of the initial signal 
because the added data is correlated with that signal. The added data is then also 
difficult to distinguish from the original signal when listening to the resulting output 
audio signal 525. One way to increase the signal levels is to multiply the levels of 
some or all of the various frequency components of the audio signal 513 within the 

30 frequency ranges 539 by a frequency dependent factor greater than unity to increase 
the level of some or all of such frequencies to a level that is equal to or some defined 
amount below the masking function 535. 

Yet another way to modify the audio signal of 513 is to add a replica of 
the original signal from one or more frequency bands, position shifted in time by one 
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• or more clock cycles with respect to the original audio signal, to the original audio 
signal. The original audio qualities of the initial signal are preserved because the 
added data is presented in very rapid sequence with respect to the original data and is 
correlated with the original audio signal. Here again, the added data is also difficult 
5 to distinguish from the original signal when listening to the resulting processed output 
audio signal 525. One way to add this replicated time shifted data is to store a block 
of the original audio signal's frequency domain coefficients, delay this coefficient 
data in time, recreate a time domain representation from the frequency coefficient 
data, and add this delayed time domain data back to the time domain representation of 

10 the original signal. Another way is to first use a narrow band filter bank in the time 
domain to separate the frequency components of the original signal into multiple 
narrow bands. Then select which frequency band or bands of the original audio data 
are most beneficial to replicate and delay by one or more clock cycles with respect to 
the original audio data, based on which one of these frequency components will 

1 5 require the most bits to accurately represent the original signal in a compressed 

version of the original signal. Then amplitude normalize these frequency components 
with respect to the original signal, such that their amplitude is above, equal to or 
below the masking curve amplitude defined by the frequency components of the 
original audio signal, based on the masking properties associated with each band of 

20 frequencies. Then time synchronize this frequency band data, and combine it with the 
original audio data. Subsequent compression of an audio signal processed in either of 
these manners is degraded because a compression algorithm will allocate additional 
bits to the added time shifted data in an effort to maintain the quality of the 
compressed audio. 

25 The curves of Figure 4 illustrate the effect of one specific application 

of the signal processing described with respect to Figures 1-3. A frequency spectrum 
541 is shown for a block of the output audio signal 525 in the same time interval as 
illustrated in Figure 3. The input signal 513 has been modified by increasing the level 
of the spectrum 533 in all frequency ranges where it was below the mask 535 (shaded 

30 regions 539) up to the level of the mask 535. . This represents the maximum increase 
of the input signal 513 that is desirable, and, as discussed above, is normally more 
than what is normally prudent to add. The main point to note from Figure 4 is that the 
output signal 525 now has a different frequency spectrum than the input signal 513. If 
the output signal is then compressed by the type of algorithm discussed above, a 
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resulting mask 543 is different. The mask of a block is calculated as part of 
compression algorithms from the frequency spectrum of the block itself and, in some 
algorithms, from data of the frequency spectra of adjacent blocks occurring in time 
before and/or after the block represented by Figure 4. 

The example shown in Figure 4 shows a large extent 545 of 
frequencies where the spectrum 541 is higher than the mask 543. The compression 
algorithm then must allocate its limited number of bits across the frequency bands 545 
which are much larger in extent of frequency than the bands 537 (Figure 3) of 
frequencies for the original signal 513. Further, the signal spectrum 541 (Figure 4) of 
the output signal 525 is much different than the spectrum 533 (Figure 3) of the input 
signal 513, differences being noted over ranges 547 of frequencies. At the same time, 
the increased signal has the effect of causing the signal spectrum 541 and the mask 
543 calculated (at least in part) from it to follow each other more closely (curves of 
Figure 4 vs. those of Figure 3). This also makes the signal less compressible after the 
signal has been increased. The result is a compressed signal calculated from the 
output signal 525 that is much different than one calculated from the input signal 513. 
The output signal 525, because of the nature of the data intentionally added to the 
input signal 513, does not lend itself to compression if a faithful reproduction of the 
input signal 513 is desired upon decompression. 

Like psychoacoustic based compression processes, the embodiment 

described above transforms the complex audio signals that are input to the system into 

the frequency domain, and masking curves for the different signal components are 

computed. The masking (hearing) threshold curves are compared with the spectrum 

of the input audio signal, and the limits on the level of quantizing noise or other added 

data that can be "hidden" by the audio signal input to the system is thus deterrnined. 

In the compression processing case, the encoder then makes decisions about the 

coarseness of the quantizer, or the number of bits that need to be assigned to each of 

the frequency components of the audio signal, in order to assure that the added 

quantizing noise, caused by the coarser quantizing process, is masked and thus 

imperceptible to the listener. In the case of the techniques being described herein, 

however, this information is employed to determine how much' extra noise, for 

example, can be added to the original audio signal input to the system, before this 

noise can be heard by the listener. Unlike the compression processing case, in which 

the output signal is the lower data rate, more coarsely quantized signal, the present 
13 
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techniques output the original signal with noise added on a frequency component by 
frequency component basis, the level of added noise chosen to be just low enough to 
be masked by adjacent frequency components in the original audio signal. The audio 
output signal then no longer has the uniform low level noise floor of the original input 
5 audio signal. Instead it has a dynamically changing, program dependent noise floor. 
If this digital audio signal is converted into its analog audio presentation and listened 
to, the added noise will properly be masked by the adjacent higher level frequency 
components in the signal, and thus not heard. If, however, this processed signal is fed 
into a compression encoder/decode process for Internet distribution, the additional 

10 quantizing noise caused by this following audio compression/decompression process 
will add to the noise injected into the audio signal by the techniques described above. 
The resulting audio signal will then contain a total noise which is over the masking 
curve limit, and thus the noise will be perceptible to the listener. These noise artifacts 
will make the compressed audio signal unsuitable for distribution over the Internet, 

15 which is an objective of the present invention. It should be noted that the injected 
"noise" can have a wide range of characteristics. These characteristics are chosen to 
be most annoying to the listener in the event the noise is made perceptible by a 
follow-on compression process. 

In a second method, timing and/or phase relationships between two 

20 channels (a stereo pair) of an audio signal composed of two or more channels, are 
modified. This modification can be a fixed phase or timing change, or a phase or 
timing change that varies over time. In addition, the modified phase or timing 
relationship can be different for each audio frequency encountered in the original 
audio signal. This technique is designed to work best with "Intensity" stereo or 

25 "Coupled" multi-channel compression possesses. Intensity stereo and coupled 

compression processes are well know in the art. These methods combine input audio 
data from two or more channels above a predefined frequency, and retain only the 
intensity of the total energy appearing in each frequency band above this predefined 
frequency. In this approach the intensity envelope of the total energy is encoded on a 

30 frequency by frequency basis, and the amplitude of the signal in each channel is 

retained. This channel amplitude information is delivered separately in the encoded 
bit stream to the decoder, so that the decoder can parcel the monophonic intensity 
envelope to each channel based on the original amplitude of the signal that appeared 
in any particular channel. By altering the phase or timing of the information in pairs 
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of these channels with respect to each other, before they are combined, common data 
appearing in each channel pair cancel, or partially cancel, during the combining 
process. This results in an output after the decompression process which varies in 
amplitude, quite unlike the original stereo audio signal. By this means, a degraded 
5 version of the original audio signal will be produced after the 

compression/decompression cycle, but, because human hearing cannot easily detect 
phase variations, the stereo audio will sound normal before the 
compression/decompression process . 

A simple implementation of the above concept calls for advancing or 

10 retarding the phase of one channel with respect to the other by a predetermined 

number of degrees, for example 1 80 degrees, of all frequencies above a predetermined 
frequency. 1 500 Hz has proven to be a good frequency to choose for this purpose. 
This process produces an audio signal which sounds identical to the original stereo 
audio signal, but will be degraded by a subsequent compression process which 

15 employs intensity stereo techniques. The resulting intensity stereo compressed and 
decompressed audio signal sounds very much as if it is emanating from an underwater 
source because of the amplitude variations introduced in the audio program material 
by complete or partial phase cancellation as described above. A similar effect can be 
produced if, instead of introducing 1 80 degree phase inversion above a predefined 

20 frequency, one of the two channels of the stereo audio pair being processed is 
advanced or retarded in time with respect to the other channel. This can be 
implemented in the digital domain by advancing or retarding one of these two 
channels with respect to the other channel by 1 or more bits. 

A more advanced version of the above concept calls for modulating the 

25 timing and or phase of a particular frequency or frequencies. For example, a rate 
below or above the lowest or highest frequency the human ear can detect can be 
employed. Such a rate could be 1 Hz. The modulation would be imposed on one or 
more frequency component present in one channel of a stereo channel pair as 
compared to the other channel of the stereo channel pair. This phase modulation will 

30 not significantly affect the processed original stereo audio data, but, when the 
processed data is compressed and decompressed by the use of an intensity stereo 
compression algorithm, causes an audio output whose amplitude varies in time and is 
quite degraded. This degradation is caused by the varying phase cancellation of the 
data which is common to both channels. 
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In a third example of the first embodiment of anti-compression, 
relationships between two or more audio data channels are again used to create an 
audio signal that will cause a compression and decompression process, which 
attempts to combine data in multiple channels as a strategy to reduce data rate, to 
5 incorrectly perform this combination process during encode and thus cause the 

resulting decoded signal to be degraded when decompressed. In this technique, data 
from one channel of a stereo pair of a multi-channel signal is reversed in phase and 
added, in the frequency domain, to data in the other channel of the stereo pair. For 
clarity of discussion we will call one of these channels the "right" or "R" channel and 

10 the other channel the "left" or "L" channel. Any two channels of a multi-channel 

audio signal, that is an audio signal with three or more channels, can be designated for 
the purposes herein as the "R" and "L" channels. The use of "R" and "L" 
nomenclature refers to a two channel stereo music source solely to aid in visualizing 
the concept, but there is no intent to limit this technique to such a source. Care is 

15 taken to insert this cross-channel data in a manner such that the donor channel signal 
data is masked after insertion into the receiver channel and does not significantly 
affect the quality of the resulting pre-compressed audio signal. 

There are three separate approaches to reach this objective. One, insert signals 
from the L channel into the R channel that are under the masking threshold of the L 

20 channel. Two, insert signals from the L channel into the R channel which are not 

under the masking threshold of the L channel, but under the masking threshold of the 
R channel. Three, insert signals from the L channel in the R channel that are under 
both the L and R masking thresholds. To further add to the post compression 
degradation of the resulting signal, the added L to R cross-signal can be reversed in 

25 phase on a periodic or aperiodic basis. To further increase the anti-compression 
effect, the reversed phase L signal can be periodically or aperiodically inserted and 
not inserted into the R channel. Additional anti-compression effects can be realized 
by reversing the phase of only some of the frequency components of the L signal that 
is added to the R signal. For example, the phase of every second or third frequency 

30 bin of the L signal can be reversed before the L signal is inserted into the R channel. 
Note that although this discussion has referred to the addition of L data in the R 
channel, this is for example purposes only. The technique is equally valid for the 
insertion of R data into the L channel. 
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A fourth method of modifying audio signal 513 once again uses the 
relationships between multiple audio data channels. In this case spurious data which 
is masked by the original audio signal is embedded into each channel of the original 
audio signal. This data is caused to be"unmasked" when the audio signal is 
5 compressed. One example of this approach is to first alter or totally reverse the phase 
of one channel of a stereo audio signal with respect to its other channel. This 
alteration in phase, which could be either fixed, varying in time, or applied 
periodically or aperiodically, could be implemented on frequencies which he above a 
predetermined frequency, over a range of frequencies, or over one or more bands of 
10 frequencies. The spurious data is then added in phase into both channels. By 

choosing the spurious data such that it is below the masking threshold of the original 
audio signal, the spurious data will be inaudible when this now processed audio signal 
is reproduced for listening. However, if this signal is compressed, using an intensity 
stereo encoder and then reproduced for listening, the original stereo audio signal will 
15 be reduced in amplitude due to phase cancellation between the channels, while the 

spurious data will be increased in amplitude, due to phase addition. This will result in 
a reduced masking level and an increased spurious data level. It will then follow that 
the embedded spurious data will be above the lowered masking threshold and be 
audible to the listener. 
20 A modification of the above strategy is to add spurious data, at a 

selected frequency or frequencies, continuously, periodically or aperiodically, to one 
channel of a stereo audio signal, phase shift this added data by 180 degrees, and add it 
to the second channel of the stereo audio signal. The intensity and frequency 
components of this added signal energy would be chosen to be below the masking 
threshold set by the audio data in each channel. Being 180 degrees out of phase the 
spurious data added to the two channels would additionally tend to cancel when 
reproduced either in free air, through speakers or through headphones, and thus be 
virtually inaudible to the listener. When the audio processed in this manner is 
encoded with a compression algorithm that sums the absolute values of one or more 
of the frequency components in each channel of said two channel audio signal in 
order to reduce the data rate requirements of the compressed signal, the absolute 
values of the embedded spurious signals in each channel will constructively add and 
the embedded spurious signals will become audible to the listener. 
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A fifth example of the first anti-compression embodiment takes 
advantage of compression strategies that detect characteristics of input and in-process 
audio data. These strategies modify their processing parameters, and/or approach, as a 
function of these detected characteristics. Audio data compression mechanisms that 
5 use different signal processing modes are employed by both monophonic and multi- 
channel encoders. Two examples of such audio compression strategies are 
"Middle/Side" or "M/S" stereo encoding, sometimes referred to as "Sum/Difference" 
stereo encoding, for compressing two channel audio signals, and "window switching", 
which is used for monophonic as well as multi-channel audio data compression. 
10 United States Patent 5,285,498, "Method And Apparatus For Coding Audio Signals 
Based On Perceptual Model", of James D Johnston, describes these two approaches in 
detail and is incorporated in its entirety herein by this reference. These different 
modes are "switched in" when special case audio signals are detected in order to 
encode these signals with the least audio artifacts at the lowest data rate possible. 
1 5 The selection mechanisms driven by these detectors can and do make 

the wrong choices when encountering unanticipated changes in audio signal 
characteristics. When this occurs an incorrect set of processing functions are 
employed to encode the incoming audio signal and the resulting encoded output signal 
does not accurately reflect the properties of the input signal. The present example of 
20 the first anti-compression embodiment takes advantage of this fact by inserting 
discontinuities into the original signal which cause the encoder to switch to an 
incorrect mode with respect to the audio data being processed. These discontinuities 
can be phase, timing, frequency, amplitude or other signal discontinuities. For 
instance, they can take the form of frequency components that have been added to or 
periodically removed from the original audio signal. Thus, when the encoded audio 
signal is decoded, a compromised quality audio output is realized. These 
discontinuities can be monophonic in nature. In this case, the mode detector's false 
analysis is prompted by discontinuities in a single channel of the audio data stream, 
without regard to activity in other channels of the audio data stream. They can also be 
multi-channel in nature. In this case the mode detector's confusion is caused by 
discontinuities which are analyzed in relationship to activity in one or more of the 
other audio data channels. 

It has been found that human listeners are most disturbed by audio 

whose characteristics change over time. If the aforementioned discontinuity causes 
18 
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the encoder to permanently switch to a mode which is inappropriate for a particular 
input audio selection, for example a certain selection of music, the decompressed 
decoded output will indeed be degraded as compared to the original signal. However, 
this degradation will be displayed by the music from its inception to its completion 
5 and the listener may become accustomed to the sound quality. With the objective of 
the first embodiment of the anti-compression process being to deter consumers from 
compressing content in their music libraries, for example, and redistributing this 
content over the Internet, a continuous degradation may not provide the reduction in 
value required. Therefore, this example five of the first embodiment of anti- 

1 0 compression includes the unique concept of adding and removing the aforementioned 
discontinuities on a temporal basis in order to cause a compression encoder to switch 
between one or more inappropriate and one or more appropriate encoder modes 
throughout the portions of the audio which is so processed. 

To illustrate the application of example five of the first anti- 

15 compression embodiment, switching between M/S "joint stereo" coding mode and 
R/L independent channel "discrete stereo" coding mode will be used. Figure 10 is an 
illustrative embodiment of a M/S stereo encoder. Perceptual Model Processor 679 
evaluates thresholds for the left and right channels. The two thresholds are then 
compared on a frequency subband basis. For example, the Right and Left input 

20 signals 669 and 671 respectively, could have been divided into 32 coder frequency 
bands. In each band, where the two thresholds vary between Right and Left by less 
than some amount, typically 2 dB, but not necessarily 2 db, perceptual encoder 673 is 
switched into the M/S mode by the action of line 681 becoming a "1". In the M/S 
mode perceptual encoder 673 uses M and S as its source data instead of R and L. 

25 That is, the Right signal for that band of frequencies is replaced by the sum of the 

Right and Left channels divided by 2 or the Middle" signal, M=(L+R)/2, and the Left 
signal is replaced by the difference of the right and left channels divided by 2 or the 
Side signal S=(L-R)/2. Thus, encoded outputs 675 and 683 are derived from M/S data 
not R/L data. The actual amount of threshold difference that triggers this substitution 

30 will vary with bit rate constraints and other signal system parameters. 

The above selection of either M/S or R/L modes is actually the choice 
between independent coding of the channels, mode R/L, or using the SUM and 
DIFFERENCE channels, mode M/S. This decision is based on the assumption that 
human binaural perception is a function of the output of the same critical bands at the 
19 
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two ears. If the signals are such that they generate a stereo image, then the choice of 
R/L coding is more appropriate. If the signals are similar then additional coding 
gains, that is either a maintaining of encoded audio quality at a lower data rate or the 
improvement of audio quality at the same data rate, may be exploited by choosing the 
5 M/S coding mode. A convenient way to detect the similarity of the two channels 
being encoded is by comparing the monophonic threshold between Right and Left 
channels. If the thresholds in a particular band do not differ by more than a 
predefined value, then the M/S coding mode is chosen. This mode is chosen because 
this situation most often occurs when the amplitude of the frequency components, 

10 which comprise both signals, are very similar. Otherwise the independent mode R/L 
is assumed. Note that associated with each band is a one bit flag that specifies the 
coding mode of that band and that flag must be transmitted to the decoder as side 
chain information. Also note that the coding mode decision is adaptive in time since 
for the same band it may differ for subsequent segments, and is also adaptive in 

1 5 frequency since for the same segment, the coding mode for subsequent bands may be 
different. An illustration of a coding decision is given in Figure 13. 

MPEG 1 Layer 3 (MP3) Version 1.0 audio compression encoder, 
developed by Fraunhoffer Gesellshaft IIS, which is used in the Opticom "MP3 
Producer" Version 2.1 application, is an example of an audio compression encoder 

20 which employs M/S stereo techniques as described above. The Fraunhoffer MP3 
audio encoder determines whether it should use the R/L or M/S mode on a frame by 
frame basis and will switch into M/S mode when the average of the monophonic 
thresholds between Right and Left channel subbands do not differ by more than a 
predefined value. Although the Fraunhoffer MP3 encoder evaluates and performs a 

25 threshold comparison the effect, as seen in the external behavior of the encoder, is that 
the encoder will assume the M/S mode when the average energy in the frequency 
components of the R channel is almost equal to the average energy in the frequency 
components of the L channel. When the average energy of the frequency components 
in the R and L channels differ by more than a certain amount, then the encoder will go 

30 into the R/L mode. When the average energy of the frequency components in the R 
and L channels vary around this predefined level the Fraunhoffer MP3 encoder can 
become confused and toggle between the M/S and R/L modes. This uncertainty is 
exploited in this fifth example of the first anti-compression embodiment. 
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Figure 1 1 is a block diagram of an implementation of the fifth example 
of the first anti-compression embodiment. It depicts the addition of phase and 
amplitude discontinuities to a stereo audio signal. As will be shown, these 
discontinuities cause the MP3 encoder, which follows the anti-compression processor 
5 depicted, to be uncertain as to the choice of M/S or R/L mode. This results in 
switching between these modes during the process of encoding the stereo audio 
signal. As shown in Figure 11, which depicts anti-compression processor 627, Right 
channel input signal 629 and Left Channel input signal 63 1 are divided into low and 
high pass signals by passing them through respecive filters 633, 635, 637 and 639. 

10 This results in Right channel high pass signal 715, Right channel low pass signal 717, 
Left channel high pass signal 719 and Left channel low pass signal 721. Ignoring for 
the present the processing performed by the network composed of 647, 645, 649, 653, 
651, and 723, Left channel high pass signal 719 is further processed by the 180 degree 
phase inverter 655 and added to the Left channel low pass signal 721 in mixer 643. 

1 5 This 1 80 degree phase inversion is not included in the processing chain for Right 
channel high pass signal 717 which is added to Right channel low pass signal 715 in 
mixer 641 . Low pass filter block 633, high pass filter block 635, high pass filter 
block 637 and low pass filter block 639 serve to add phase and amplitude 
discontinuities around a predefined frequency. In the implementation shown, this 

20 frequency has been chosen to be approximately 1600 Hz. Note that 1600Hz has been 
chosen for illustrative purposes only and could have been chosen to be any frequency 
above or below 1600Hz. How effective the chosen frequency will be depends on the 
audio signals being processed. The phase and amplitude characteristics of these filter 
blocks are shown in Figure 12. 

25 Of course, the exact characteristics of these discontinuities will be 

dependent on the filter characteristics chosen and how the falling slopes of the low 
pass filters and the rising slopes of the high pass filters are related. In the 
implementation depicted, the falling slopes of low pass filters 633 and 639 and the 
rising slopes of high pass filters 635 and 637 have been chosen to be quite sharp, 

30 about 60 dB per octave, and their cross over point 659 has been chosen to be -6dB 
from the flat portion of the filters frequency response. This selection of filter 
characteristics are for a specific example only. Other filter characteristics can 
alternatively be chosen. However, this set of characteristics will cause the frequency 
spectrum discontinuities injected into the Right and Left signals to assume minimum 
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audibility in the uncompressed Right and Left stereo signal. They also can cause the 
M/S-R/L selection determination in the subsequent MP3 encoder process to be 
uncertain. As can be seen from Figure 12, low pass filter falling slope 657 causes an 
amplitude dip in both the Right and Left Channels that begins at about 1500 Hz, 
5 before the high pass filter rising slope 661 has an opportunity to compensate for this 
loss in signal energy. Also, Figure 12 depicts rapidly changing non-linear phase 
responses 665 and 669 which culminate at an inflection point 667. This inflection 
point occurs at approximately 1600 Hz. When the R and L signals 629 and 631, 
respectively, are passed through this processing, by being separated into high and low 

1 0 bands and individually recombined through the action of mixers 641 and 643 
respectively, these rapidly occurring, non-linear, amplitude and phase changes, 
centered around a 1600 Hz frequency, recombine in a constructive and destructive 
manner and result in transient changes in amplitude in processed Right Channel 775 
and processed Left Channel 779 of Figure 11. In the case of processed Left Channel 

1 5 779, because of the action of inverter 655, these transient changes in amplitude are 

shifted in phase and therefore assume different amplitudes and timing as compared to 
the transients which appear in processed Right Channel 775. 

If the average thresholds of the Right and Left Channels of a musical 
selection, which is to undergo Anti-Compression processing, are either solidly within 

20 the predetermined threshold difference band defined by a subsequent MP3 encoding 
process, or are substantially outside this difference band, the addition of the above 
described transients may be insufficient to cause the MP3 M/S - R/L analysis and 
detection mechanism to become confused and switch between M/S and R/L modes. If 
the Right and Left average thresholds are within this difference band, the MP3 

25 encoder would remain in the M/S mode. If they are substantially outside this 

difference band, the MP3 encoder would continuously assume the R/L mode. Thus, it 
is preferred that a narrow threshold band be maintained between the channels in order 
to add Anti-Compression characteristics to the input audio signal, using the example 
Anti-Compression processing scheme. This situation is resolved by the cross channel 

30 mixing processing network composed of circuit blocks 647, 645, 649, 653, 651, and 
723 of Figure 11. For the MP3 encoder in this example, which chooses either the M/S 
or R/L mode depending on the difference between the average threshold derived from 
the thresholds of each coder frequency band in each channel, this network is adjusted 
such that the difference between the average thresholds of the Right and Left channels 
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are forced to reside in the range of M/S - R/L switch uncertainty, where the MP3 
encoder will switch between the two modes if the thresholds of the music varies. 
Natural variations in the Right and Left channel thresholds of the music being 
encoded will cause this to occur. 
5 The effect these transients changes have on the MP3 encoding process 

are best visualized when the processed R and L signals, 775 and 779, respectively are 
converted to M and S signals. Recall that M = (R+L) and S = (R-L). Figure 14 
depicts M and S signals, associated with a musical selection called Babyface, before 
and after Anti-Compression processing 627 shown in Figure 1 1 . Original M and S 

10 input signals 691 and 695, respectively, are processed by Anti-Compression processor 
627 into M and S output signals 693 and 697 respectively. Note transients 699, 701, 
703, 705, 707 and 709. It is these signal discontinuities, which are directly derived 
from the Anti-Compressed Right and Left Channel signals, that cause the MP3 
process to be uncertain as to the mode it should be in. Also note that if the MP3 

15 encoder was to stay in one mode, the level of disturbance to the listener, caused by the 
action of the Anti-Compressed signal on the MP3 encoder, would be much lower, 
than if MP3 encoder continually switched between modes. It for this reason that 
audio quality modification, along with audio quality variation, are both unique 
characteristics of an Anti-Compressed audio signal that has undergone subsequent 

20 audio compression encoding and decoding. 

The methods and apparatus associated with the implementation of the 
first embodiment of the present invention are generalized with respect to Figure 15. 
An audio signal 757 is inputed to a Combiner 753 and a Psychoacoustic Analyzer 
761. The Psychoacoustic Analyzer 761 determines the acoustic elements that 

25 comprise input audio signal 757, in terms of both spectral components and the timing 
of these spectral components, and inputs this data, which appears on line 765, to a 
Degradation Generator 763, a Forcing Function Generator 791 and a Masking 
Function Generator 803. The Degradation Function Generator 763, Forcing Function 
Generator 791 and Masking Function Generator 803 all employ the data on line 765 

30 to create signals 755, 751 and 803, respectively, that are combined with the original 
audio signal in the Combiner 753. A degradation function Input 755 is created such 
that it is minimally audible in the Anti-Compressed audio output appearing on line 
759, but, following a compression process, is perceptible in the decompressed version 

of this signal. A Forcing function Input 751 is also created such that it is minimally 
23 
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audible in the Anti-Compressed audio output appearing on line 759, but in this case 
the objective is to force audio compression encoding processes, which subsequently 
acts on the Anti-Compressed audio output 759, to employ encoding techniques or 
parameters during the encoding process that are inappropriate for the proper encoding 
5 of the Anti-Compressed audio output 759. Masking Function Input 801 serves the 
purpose of reducing the audibility and/or increasing the acceptability of the additional 
signals added to the input audio data stream by the Forcing Function and/or 
Degradation Functions generators. Note that the Forcing function 751 is also input to 
the Degradation Generator 763 and the Masking Function Generator 803. Therefore, 

10 in addition to causing an audio compression encoder to be uncertain as to what mode 
it should employ for encoding the Anti-Compressed audio signal appearing on line 
759, or be forced into an inappropriate mode for encoding the Anti-Compressed audio 
signal appearing on line 759, Forcing function 751 also provides tirrhng information 
to Degradation Generator 763 and Masking Function Generator 803. This permits the 

1 5 Degradation Function 755 and the Masking Function 80 1 to be inserted in the Anti- 
Compressed signal 759 at the time or times during which they will be most effective 
in causing the desired effect. In the case of the Degradation Function 755 this time or 
times are chosen to cause the Degradation Function to be audible after a compression- 
decompression cycle and non-offensive in the Anti-Compressed (ACTed) output 

20 signal 759. In the case of the Masking Function 801, this time or times are chosen to 
reduce the audibility of the Degradation Function and/or the Forcing Function in 
ACTed Audio Output 759. 

Two items should be noted. First, it is sometimes unnecessary to 
include a separate Degradation Function and a separate Masking Function in Anti- 

25 Compressed output signal 759 in order to achieve the desired effect after a 

compression-decompression cycle. The act of a Forcing Function placing the audio 
compression encoder into a mode which is inappropriate for'the proper processing of 
the original audio signal, can, by itself, be sufficient to cause the decoded 
decompressed version of the original audio signal to display the desired degradation. 

30 If the Forcing Function is sufficiently inaudible to the listener not to be distracting, 
the addition of a separate Masking Function would be unnecessary. Second, the 
Masking Function could be perceivable by a human listener, listening to an audio 
reproduction of the ACTed Audio Output 759, and still be acceptable. This case 
would occur if the Masking Function added to 759 is chosen to complement the 
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artistry of the music signal appearing on 759. Such would be the case if the Masking 
Function was chosen to be, for example, a synthesized or naturally occurring trumpet 
sound that contained frequency components of the appropriate amplitude to mask the 
audibility of the inserted Degradation and/or Forcing Functions, and said Masking 
5 Function was inserted into an appropriate musical passage. 

The processing elements defined in the generalized Anti-Compression 
process depicted in Figure 15 are often encountered as compound elements that 
perform one or more of the Anti-Compression processing functions. For example, in 
the case of the fifth example of the first Anti-Compression embodiment depicted in 

1 0 Figure 1 1 it can be seen that forcing function 751, produced by Forcing Function 

generator 791 of Figure 15, is created by the actions of the Low Pass Filters 633 and 
639 and the High Pass Filters 635 and 637. These elements add the temporal and 
spectral discontinuties that are desirable to cause a subsequent MP3 encoding process 
to switch between M/S and R/L modes. Thus they provide the forcing function 

1 5 required to cause audio compression encoder mode uncertainty. It can also be seen 
that the Degradation Generator function 763 of Figure 15 is provided by the Inverter 
655 of Figure 11. This element causes spectral content above the 1600 Hz inflection 
point to destructively add during the creation of the M signal (M = R + L) when the 
MP3 encoder process is in the M/S mode, thus causing a loss of high frequencies in 

20 the M signal. It also causes spectral content above 1600 Hz to constructively add 

during the creation of the S signal (S = R - L, S = R - (-L), S = R + L) when the MP3 
encoder process is in the M/S mode. Since in the M/S mode, the MP3 encoder 
provides the majority of the bits to the M signal, and the M signal has been degraded 
above 1600 Hz, the resulting decoded M and S signals will provide R and L signals 

25 that do not display the same high frequency characteristics as the original Anti- 
Compressed R and L signals appearing on lines 775 and 779 of Figure 1 1 . Thus it can 
be seen that the Inverter 655 serves the same purpose as the Degradation Generator 
763 of Figure 15. In addition, the function of the Combiner 753 of Figure 15 is 
provided by adders 641, 643, 645, and 723 of Figure 11. The only function provided 

30 for in Figure 1 5 and not present in Figure 1 1 are those of the Psychoacoustic Analyzer 
761 and the Masking Function generator 803. These elements, which.enhance the 
Anti-Compression process, are not included in the simple implementation of example 
5 of the first Anti-Compression Embodiment. 
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One important application of the signal modification system 511 
depicted in Figure 1 is illustrated in Figure 5. After the music or other program 
material for reproduction on a Compact Disc ("CD") is assembled as a digital file, 
indicated by a block 551, that file is processed by one or more of the techniques 
5 described above to add signal data to the audio signals of the file before making a CD 
master recording 553 from it. The content of the resulting replica CDs that are sold to 
consumers cannot then be compressed without a significant loss of quality of the 
content signals when decompressed. The same techniques can also be used when 
storing or distributing audio content by other means such as with audio tape, as a 

10 component of a Digital Video Disc ("DVD"), or as the digital or analog sound track 
on a motion picture release print. Since such compression is currently required before 
the audio content can be stored or distributed in several ways, such as storing in non- 
volatile semiconductor memory cards or transmission over the Internet or other 
communications network, unauthorized copying and distribution of the content is thus 

1 5 greatly discouraged. The degraded music or other audio content is of little value. 

The block diagram of Figure 6 illustrates a use of the present invention 
in the distribution of music or other audio content over the Internet in a manner that 
greatly discourages copying and re-distribution of the content by the recipient over the 
Internet. A master audio source file 555 is compressed, as indicated by a block 557, 

20 and then encoded, as indicated by a block 559, in order to provide a secure 

transmission that can be decoded only by the intended recipient. The compressed and 
encoded digital signal is then transmitted over the Internet 561 to the intended 
recipient who, in the normal case, has paid the content provider for it. The recipient 
must then decode the incoming signal, as indicated by a block 565, by use of a key or 

25 other accepted technique, and then decompress it, as indicated by a block 567. At this 
point, however, the master audio source file 555 is available to the recipient in a 
decoded and decompressed form that can easily be distributed to others over the 
Internet by a recipient who is willing to violate the copyright of the content provider. 
But since such unauthorized distribution is practical only if the content file is first 

30 again compressed by the recipient, noise or other data is added to the decoded and 
decompressed content file by the recipient's audio player or other utilization device, 
as indicated by a block 569. The recipient can, however, reproduce the audio content 
without degradation after the audio signal has been modified. The content, in the 
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form of an analog or pulse code modulated ("PCM") signal, for example, is applied to 
standard audio circuits 571 that drive a loud speaker or head phones. 

Such a signal addition in the recipient's utilization device is made 
effective when the recipient has no effective choice but to receive an output of the 
5 content from his or her utilization device after the audio signal has been modified. In 
order to prevent the recipient from accessing the content signal before the signal is 
modified in the step 569, the signal modification is preferably performed in a 
physically sealed module 115' that also includes the decoding function 565. A key 
necessary for decoding the signal is included within the module in a manner that 

1 0 renders it inaccessible to the recipient. Since the content provider can make it a 

condition of supplying the music or other content that the recipient use such a sealed 
module to decode the transmitted encoded content, the added security against the 
recipient being able to easily redistribute the audio content is conveniently included in 
the same sealed module. As can be seen from Figure 6, a decoded digital signal of the 

1 5 content is not available except within the sealed module 115'. An input to that module 
is an encoded signal which the recipient cannot decode except with use of the module. 
An output of the module 115' presents the content in a standard format, such as an 
analog or PCM signal, which could normally be re-digitized or otherwise manipulated 
by the recipient for unauthorized redistribution. But since such redistribution 

20 normally requires that the signal be compressed prior to doing so, the noise or other 
data that is added to the output signal by the processing step 569 makes that highly 
undesirable or even impossible. 

The sealed module 1 15' is a variation of the module 115 described in 
the aforementioned Secure Transmission Patent Application, with a specific version 

25 shown in Figure 7 hereof, where the reference numbers are the same as used in the 

Secure Transmission Patent Application but with a prime (') added for corresponding 
elements that are modified herein. The primary, and perhaps only, component of the 
sealed modulell5' is a digital signal processor ("DSP") integrated circuit chip 135'. 
The primary difference here is the inclusion of signal modification software 573 in its 

30 non- volatile memory 147' in a manner that the user cannot access that software or 

defeat its use to add the anti-compression noise or other data before an audio signal is 
made accessible to the user (recipient) at an output of the module. 

As described in the Secure Transmission Patent Applications, the 
module 1 15' is preferably implemented in the form of a small key card that is made 
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personal to a particular user by storing decryption (decoding) key(s) in its memory 
147' that are unique to the user. The key card is removably inserted into the user's 
audio player when connected to the Internet, a kiosk in a music store, or other content 
providing device, in order to purchase content from a provider with use of the user's 
5 key(s) stored within the card. The key card is also inserted into the recipient's player, 
as well as others, in order to allow the received content to be played by the recipient 
while restricting the extent to which the content can be transferred to or played by 
others. By the controlled addition of noise or other data to the content signal output 
of the sealed key card, according to the techniques described herein, unauthorized 
10 distribution and use are further technically restricted. 

Second Embodiment: Allowing one Compression and Decompression of an Audio 
Signal 

Figure 8 shows a second embodiment of the present invention. In this 

15 second embodiment an encode/decode compression algorithm pair is described which 
has the characteristic of producing compressed audio data that can be decompressed 
for listening, but cannot be compressed with quality for a second time, thus 
effectively disallowing retransmission of the audio data over the Internet. A 
compression algorithm with this characteristic is called a "one generation" algorithm. 

20 The use of a one generation algorithm serves as an alternative to including anti- 
compression signal modification in the recipient's player, as described with respect to 
Figure 6 and 7. As depicted in Figure 8, an audio source file 577 is compressed with 
an available algorithm, as indicated by a block 579, and some noise or other data for 
the same purpose is added, as shown by a block 581 . The amount that the audio 

25 signal is increased by 58 1 is below that which significantly affects the quality of the 
content when decompressed by the user. But it is sufficient to cause the quality of the 
content signal to be significantly degraded if the decompressed signal is again 
compressed with the type of algorithm described previously. In either of the versions 
of the first embodiment shown in Figures 6 and 7 or that of the second embodiment 

30 shown in Figure 8, electronic distribution of music or other content is facilitated. It 
should be noted that the block 581 can be combined with the block 579 to form a 
single stage compression algorithm which provides a compressed audio output with 
anti-compression signal components added. In this case, a "calculate signal increases" 
block, such as block 521 of Figure 1, and an "adder" block such as block 525 of 
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Figure 1, would be incorporated into the compression algorithm itself, following the 
compression algorithm's non-linear quantizer block and preceding the compressed 
audio output from the compression algorithm. 

A second approach applicable to the one generation codec embodiment 
5 described above employs the fact that compression algorithms inherently add 

quantization noise to the original signal during the compression process itself. As 
previous described, this is due to the fact that individual frequency components of the 
signal are more coarsely digitized in an effort to reduce the number of bits used to 
described the signal. This leads to "generation loss" when "cascading" compression 

10 processes. When compression algorithms are cascaded, that is a signal is compressed, 
then decompressed and then compressed and decompressed once again, the resulting 
signal is naturally noisier than the original signal. The second embodiment of the 
present invention can take advantage of the mechanisms that produce generational 
loss, by employing those techniques that inherently modify the signal. These 

15 mechanisms can be used to naturally produce an output that, for example, has 

embedded noise which is very close to the masking thresholds depicted in figure 3. 
Such a result could be obtained by employing a non-linear quantizer in the 
compression algorithm that is adjusted to more coarsely quantize the individual 
frequency components of the signal. Thus, this output signal would not be able to 

20 undergo a second compression/decompression cycle without the added noise from the 
second compression cycle being above the masking threshold, and thus being audible 
in the output signal. 

A third approach to implement the second embodiment of the present 
invention uses the fact that compression algorithms with improved generational 

25 qualities often use additional techniques to reduce bit requirements without adding 

quantization noise. These techniques can provide the basis for further one generation 
functionality methods. For example, some algorithms, such as the Dolby AC-3 
compression algorithm, employ a technique called Huffman encoding in addition to 
reduced quantization resolution on a frequency band by frequency band basis. 

30 Huffman encoding uses the elimination of redundancies in the audio signal over time 
to reduce data requirements. It decreases the number of bits needed to described an 
audio signal by first encoding the audio signal using complete information and then 
only using differences in this information to describe the audio signal over a defined 

sequential time interval. Compression algorithms using such a technique have better 
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generational characteristics than those that do not because they can use finer 
frequency band quantization and still maintain the desired compression ratio. They 
suffer, however, from having reduced audio data time resolution. The underlying 
assumption that significant changes in input audio signal characteristics will not take 
5 place over the time window used by the Huffman encoding process, can be used by 
the one generation compression process. One example of such use is the addition by a 
one generation audio compression process of short duration audio data or noise bursts 
to its output audio data stream. It is well known in the art that as an audio data sample 
is reduced in duration it must be of greater amplitude to be perceived by the listener 

10 when in the presence of competing sounds. For example, an 8 kHz tone with a 
duration of 1 millisecond, beginning 2 milliseconds after the initiation of 60 db of 
Uniform Masking noise, must be 33 dB greater in amplitude as compared to an 8 kHz 
tone with a duration of 20 milliseconds, beginning 2 milliseconds after the initiation 
of 60 db of Uniform Masking noise, to be perceived by the human ear. This was 

1 5 reported by H. Fasti in 1 976 in his paper 'Temporal masking effects: I. Broad band 
masker' which appeared in Acustica, 35(5), 287-302. Audio data samples which 
occur randomly in time, or at chosen predeteraiined time intervals, and are short 
enough in time duration will therefore not be easily sensed by the listener, but will be 
detected by an audio compression process attempting to compress the audio signal. 

20 Using some of the specific techniques described above, as exemplified in Figures 3 
and 4, will further hide the randomly added audio samples from a listener. If this 
audio compression process employs Huffman encoding, these pulses will 
asynchronously occur at the time the Huffman encoding process is preparing the data 
which is used as the reference for subsequent audio difference samples, and cause 

25 these subsequent samples to incorrectly represent the audio being compressed. In the 
case of Dolby AC-3, the Huffman encoding window is 30 milliseconds. This means 
that the output compressed audio will be corrupted for 30 milliseconds each time the 
Huffman reference information is spuriously altered by these embedded short audio 
noise bursts. This corruption will represent a significant degradation of the 

30 decompressed audio signal. 

From the previous paragraph, the addition of embedded short noise 

bursts can be used to anti-compress an audio signal that has not been previously 

compressed. Any compressed and subsequently decompressed version of an audio 

signal that has been anti-compressed in this manner will thereby be degraded as 
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compared to the original audio signal. By adding the frequency domain equivalent of 
these short noise bursts to, for example, the MP3 compressed version of an audio 
signal, these bursts will be decoded by a subsequent MP3 decoder as if they were part 
of the original signal. Since, as previously described, these noise bursts were masked 
5 by the original signal, the presence of these noise bursts in the decoded version of this 
encoded audio stream will be difficult to detect. However, if this decoded audio data 
stream is once again subjected to a compression encoding process, these bursts will 
cause the disruption in audio encoding function previously described, and the 
decompressed output from this recompressed audio stream will be degraded as 

10 compared to the original decompressed audio signal. Keep in mind that in the case of 
the first decoding of the compressed audio stream, the noise bursts have been added 
after all compression processing has been completed, and therefore the noise bursts 
have not disrupted any of the compression processing employed. However, in the 
case of the second decoding, the noise bursts were part of the audio signal being 

15 compressed and therefore disrupted the audio compression encoding process as 

previously described. It is for this reason that the subsequent decoded audio stream 
from this recompressed data stream is degraded. It is important to point out that 
although this example employs noise bursts as the means to cause audio compression 
encoder misbehavior, any of the anti-compression techniques discussed in this 

20 disclosure could be used. The unique concept of embedding data within a compressed 
audio or video signal that is decoded by a subsequent decoding process as if it was 
part of the originally encoded data, and which is in a form that is compatible with the 
compressed audio or video data which comprises said compressed audio or video data 
stream, is a fundamental part of the one-generation codec idea that comprises the 

25 second embodiment of the present invention. 

As previously illustrated, some of the specific techniques described 
add sufficient noise to an audio signal at various frequencies and amplitudes to 
adversely affect application of a subsequent compression algorithm, but not enough to 
discernibly affect the quality of the signal without such further compression. A fourth 

30 approach applicable to the one generation algorithm of the second embodiment of the 
current invention shown in Figure 8, uses a different method of accomplishing similar 
ends. It employs the concept of temporal unmasking. As described above, a usual 
compression encoding algorithm operates on successive, uniform blocks 529, 531 etc. 
of digital samples of the signal 527 (Figure 2). If these blocks are not uniform, 
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information defining the timing and number of bytes of data associated with each of 

these blocks of digital samples must be sent along with the compressed data for use 

by the compression decoding algorithm in order to reconstruct a replica of the signal 

527. It is the alteration of this block timing and block size that can constitute the 

5 noise or data added by block 581 in the embodiment of Figure 8, either alone or in 

combination with some level of spectral alteration. 

In one popular compression process, each successive block of audio 

data includes 256 new time samples as well as the previous 256 time samples. This 

block of 5 12 overlapping samples is windowed and the data in this window, which 

10 moves in time, is transformed into 256 unique frequency coefficients. In addition, the 

input signals are analyzed with a high frequency bandpass filter, to detect the presence 

of transients. This information is used to adjust the block size of the data 

transformed, restricting quantization noise associated with the transient to within a 

small temporal region about the transient, avoiding temporal unmasking. The method 

under consideration utilizes the fact that the changing data block size and/or 

windowing time position, occurring on compression encode, must be transmitted to 

the decompression decoder in order to accurately decompress the encoded audio 

signal. One method of doing this is through the use of side chain information, 

although other methods, which embed this information into the compressed audio data 

stream itself, may be employed. This permits the decoder to accurately synchronize 

the decode operation with the varying encoded data block size and assure the same 

block size is employed for decode as was used for encode, thus avoiding temporal 

unmasking. The present method takes advantage of the fact that this additional side 

chain information is not included in the decompressed audio data stream and is thus 

not available to subsequent compression processes. 

To exploit this circumstance, the present method calls for the one 

generation compression algorithm under consideration to place transient noise or data 

at locations in the audio data stream being compressed which is synchronized with the 

sample block size and sample block timing used during the process of transforming 

the audio data stream data from the time to the frequency domain. This transient 

extraneous data is tailored such that the audio data present in the audio signal begin 

compressed, which occurs immediately before and immediately after the transient, 

masks the audibility of these transients, so they will not be perceptible to the listener 

when the audio signal is decompressed. In addition, the one generation compression 
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algorithm under consideration uses a varying sample block size during the process of 
transforrning the data from the time to the frequency domain. Data regarding this 
varying block size, as well as data regarding where transients were inserted into the 
audio stream; are transmitted to the decoder by one of several means well known in 
5 the art. This data will permit the original audio signal to be decompressed and 
reproduced with high quality. No transient artifacts would be heard by a listener. 
However, since block size and transient timing information is not included with the 
decompressed audio data stream, a subsequent compression process, whether it uses a 
fixed size window, multiple fixed sized windows or dynamically sized windows to 

1 0 analyzing the spectral and temporal components of the audio signal being 

compressed, will be unable to select the best window size for transient response, or 
synchronize the windowing function to the transients that were inserted in the 
uncompressed, treated audio stream. This will cause these transients to be temporally 
unmasked and therefore audible at the output of the second compression 

15 decompression cycle. This temporal masking embodiment, as the others, is 

advantageously implemented in the system described in the above referenced Secure 
Transmission Patent Application, in order to prevent the consumer from having access 
to the digital signals from the first compression process before they are converted to 
PCM or analog signals. 

20 In a fifth example of the one generation codec embodiment, phase, 

timing and/or amplitude discontinuities are inserted into one or more of the channels 
of the encoded audio. These discontinuities are designed to be as imperceptible to the 
human ear as possible when they appear in the decompressed audio. However, they 
are tailored to cause the initiation of different compression processing modes in a 

25 subsequent encoding process, as described in the fifth example of the first anti- 
compression embodiment of this invention. The incorporation of these discontinuities 
in the codec allows for the discontinuities to be embedded in the encoded signal at the 
time of encoding, or the passing of discontinuity information from the encoder to the 
decoder by means of carrying the additional discontinuity data along with the encoded 

30 data stream in the data structure of the encoded signal. 

In the case where discontinuities are embedded into the encoded signal 

at the time of compression encoding, encoded discontinuities are added to the 

encoded, compressed audio data itself, such that the decompression decoder will pass 

these discontinuities into the decompressed data stream without acting upon them, 
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other than to decode them and convert them from the frequency domain to the time 
domain. They will therefore appear in the decompressed data stream with minimal or 
no alteration and be difficult to perceive in the decoded data stream. However, once 
this decoded data stream is again compressed and subsequently decompressed, these 
5 discontinuities cause this second decoded data stream version to be degraded, as 

previously described, compared to the audio signal that was first encoded. Figure 16 
depicts an implementation of this unique One Generation encoder approach. A Right 
audio input channel 821 and a Left audio input channel 823 are simultaneously 
inputted into the ACT processing scheme beginning with a Psychoacoustic analyzer 
block 761 and ending with a Combiner block 753, and the audio compression 
encoding scheme beginning with a Buffer block 825 and ending with a Bit Stream 
Composing and Buffering block 829. The ACT processing scheme depicted in Figure 
16 is the same method previously described and depicted in Figure 1 5 of the present 
patent specification. The audio compression encoding scheme depicted in Figure 16 
is fully described in the previously mentioned United States Patent 5,285,498, of 
James D Johnston. As illustrated in Figure 7 of the Johnston patent's specification, 
ACT Data Signal 827 is equivalent to ACTed Audio output 759 of Figure 15 hereof, 
less the PCM Audio Input 757. As shown in Figure 15, the ACTed Audio Output is 
composed of a Forcing Function 751 combined with a Masking Function 801, a 
Degradation Function 755 and a PCM Audio Input 757. Thus, 827 represents the 
ACT signal derived from the aforementioned Anti-Compression signal components 
before they are combined with the input signal which is undergoing Anti- 
Compression processing. 

The ACT Data Signal 827 is then input to an Encoder and Formatter 
block 81 7 to be converted into the frequency domain and formatted such that it can be 
combined in Combiner blocks 831 and 833 with the transform coded and quantized 
version of the input audio signals appearing on lines 835 and 837. The combined 
encoded audio and Anti-Compression elements are then passed through Huffman 
Coding block 839 to losslessly remove redundant information. Note that the addition 
of Anti-Compression data elements, that appear on lines 815 and 813, to the encoded 
audio signal components that appear on lines 835 and 837, will, in general, increase 
the data rate of the encoded signal. Since the output data rate from the compression 
encoder is fixed, the increase in data rate needs to be compensated for by reducing the 
amount of data which comprises the encoded audio data stream itself. This 
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compensation is effectuated by the use of Line 819, # Bits., which feeds back the 
combined audio and Anti-Compression data rate to an Iterative Quantization block 
841. The information provided by a line 819 causes the block 841 to increase the 
quantization coarseness of the encoded audio signal, thereby reducing the encoded 
5 audio data rate and compensating for the additional Anti-Compression data elements 
that have been placed in the encoded audio signal. After Bit Stream Composing and 
Buffering by a block 829, the resulting encoded compressed audio signal is now in a 
form that can be decoded and decompressed by any appropriate decoder using 
techniques which are well known in the art. However, the decoded signal produced 

10 by these decoders will be unique in that the decoded audio output delivered will 
contain Anti-Compression elements that disallow a subsequent compression and 
decompression process from delivering a high quality audio experience. 

It should be noted that the "single ended" one generation codec 
approach described above, a technique that does all anti-compression processing of 

15 the input audio signal during the encoding of the compressed audio data steam 

without using the decompression decoder as part of the process, is a unique concept. 
By permitting the deployment of decompression decoders, which are capable of 
playing current content, as well being able to properly reproduce One Generation 
compressed audio content, this methodology allows the establishment of an installed 

20 based of players and customers, before One Generation encoders and One Generation 
compressed audio content is generally available. For example, if one were to chose to 
make an MP3 compatible One Generation encoder there would be an established base 
of hundreds of millions of One Generation MP3 players in the field at the present 
time, each player capable of producing anti-compressed audio signals from One 

25 Generation MP3 encoded content. 

In the case of the One Generation Codec approach, which employs the 
passing of Anti-Compression discontinuity information from the encoder to the 
decoder in the data structure of the encoded signal, not in the encoded audio data 
itself, the decoding and mixing of the discontinuities with the decoded data stream 

30 takes place in the decoder. This has the benefit of permitting the original, 

unprocessed encoded data stream to be recovered, if this should be desired, but 
requires that the discontinuity information be hidden in the encoded data structure so 
it cannot be removed before it is added to the decoded audio data. It should be noted 
that a decoder can be constructed such that the discontinuity data is generated as part 
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of, or as a separate process from, the decoder, using the principles illustrated in Figure 
15, with the PCM Audio input 757 being the PCM decoded output of the 
decompression decoder. In this case, no discontinuity information is passed to the 
decoder from the encoder. The discontinuity information would be derived from 
5 analysis of the signal characteristics of the decoded audio signal and combined with 
the decoded audio signal before it is delivered to the user as a time domain audio 
output. 

This one-generation approach provides compressed audio data that can 
be stored and distributed in any of a number of ways. The distribution of such audio 

10 data in a form for use with individual portable audio players is mentioned above. In 
this case, the players contain the software necessary to decompress the data. The 
media storing the compressed data can be any one of commercially available media, 
such as non-volatile semiconductor memory in the player itself or in removable cards, 
small rotating magnetic disk drives and small optical disks. However, it is preferred 

1 5 that security techniques be applied to restrict access to such compressed data in order 
to prevent it from being distributed in its compressed form. An audio signal 
decompressed from a copy of the compressed data file will have a high quality. 
Security techniques, such as those described in the Secure Transmission Patent 
Applications referenced above, are therefore desirably applied. 

20 Another application is with the sound track of motion picture films. 

Sound is commonly recorded in a compressed form. Movies are often video taped 
during an opening theater showing of them by a member of the audience. The video 
tape is then used to make copies of the film that are then distributed illegally. In order 
to obtain a good quality sound signal, an infrared audio signal transmission that is 

25 available in many theaters for use by people who are hard of hearing is intercepted 
and used. This uncompressed sound signal is then recompressed for recordation on 
the copies. If the sound track of the film has been compressed with one of the 
techniques described above, however, the audio signal decompressed from the illegal 
copies will have an unacceptable quality. 

30 

Changing the Audio Signal Processing 

Although the various example implementations of two embodiments of 
the present invention have been described in the form of fixed algorithms applied to 
an input audio signal, all of the algorithmic processes described can be adjusted 
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during their application as a function of input audio signal characteristics. The 
objective of this adjustment is to maximize the difference between the processed 
audio signal and the processed audio signal after undergoing audio compression. This 
"adaptive processing", referred to as optimization, can be effectuated by first 
5 analyzing the amplitude and timing of the input audio signal's frequency components, 
as well as the relationship between the audio data present in each channel of the input 
audio signal, and then using this information to select from a multiple of processing 
algorithms or to adjust process algorithm parameters and function. Changes to the 
phase, amplitude and frequency modifications, as well as the character of the spurious 

1 0 data, introduced in the treated audio signal will directly influence both the quality of 
the uncompressed processed audio signal and the amount the processed audio signal is 
degraded after compression. 

The block diagram of Figure 9 depicts anti -compression method 619 
which can be used alone to add anti-compression characteristics to uncompressed 

1 5 audio signals or as part of a one generation audio compression codec 619 that operates 
on two channel stereo audio signals and tunes anti-compression processing as a 
function of input signal characteristics. For a monophonic implementation, only 
blocks 583, 585, 587, 589 and 593 of 619 would be required because the additional 
blocks shown, 611, 603, 601, 599, 597 and 595, are for second channel relationship 

20 analysis and second channel anti-compression processing. For a greater than two 
channel implementation, elements of method 619 are replicated to accommodate the 
processing and relationship analysis required by the additional channels. An instance 
of blocks 611, 603, 601, 599, 597, and 595 would be required for each additional 
channel added. In method 619, stereo audio channel number 1 is applied to input line 

25 617 and stereo audio channel number 2 is applied to input line 605. These two audio 
signals are separated into their individual frequency components by filter bank 583 
and filter bank 603 respectively. Although not depicted, the frequency component 
separation process would normally be digital in nature and require the input signals to 
first be converted to digital form, if they were not already in digital form when 

30 applied. In addition, filter banks 583 and 603 could either be transformed based, as 
employed by signal modification system 51 1, or a sub-band based. If a transform 
based process is employed, a block quantizing step would be required before the 
frequency component separation step performed by blocks 583 and 603. 
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The method 619 assumes the use of a sub-band based process, so no 
prior block quantizing step is shown. A sub-band based process uses narrow band 
time domain filters to continuously partition the input audio signal into its critical 
frequency bands. The input audio signal is therefore not transformed into its 
5 frequency domain representation and thus no block quantizing step is required. The 
frequency component activity analysis derived by blocks 583 and 603, which 
corresponds to block spectrum 533 of system 51 1, is used by blocks 585 and 601 
respectively to calculate the masking functions associated with each of the two stereo 
channels as well as to derive, for example, temporal audio activity, audio signal 

10 dynamic range, and audio signal baseline offset. This information is used by spurious 
signal generator blocks 587 and 599 respectively, often in conjunction with data from 
signal relationship block 61 1, to create spurious signals, which are combined with the 
input stereo signals 617 and 605 by adder blocks 593 and 595, which are output on 
lines 591 and 621 as anti-compressed treated signals. It is also used by signal 

1 5 modification blocks 589 and 597, also often in conjunction with data from block 611, 
to alter, but not add to, the signals output on 591 and 621. For example, time related 
masking curve information from blocks 585 and 601 can be employed by blocks 587 
and 599 to create noise bursts inserted into the output audio signals 591 and 621 that 
are optimized in both timing and in frequency characteristics, so as to maximally 

20 confuse audio compression codecs employing Huffman encoding techniques, as 
previously described, but which are masked by the audio signal frequency 
components present so they are minimally audible to the listener. Also, the frequency 
and phase relationships between the input audio signals appearing on line 617 and 
605, that are derived by the actions of block 611, can be used by audio signal 

25 modification blocks 589 and 597 to adaptively shift the relative phase of frequency 
elements common to both output signals 591 and 621, so as to cause audio 
compression codecs employing joint stereo encoding techniques to be optimally 
confused, as previously described, and produce degraded results. Further, signal 
relationship data from block 61 1 can be used by blocks 587 and 599 to add out of 

30 phase extraneous signals into each of the output channels, through the use of blocks 
593 and 595, that can only be heard if the stereo output signal is compressed with an 
audio compression codec using absolute value addition techniques, as was also 
previously described, thus again causing poor results from a subsequent 
compression/decompression process. 
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In a typical application of either the first or second embodiment of the 
present invention, each of multiple mcoming audio signals is modified according to a 
common algorithm. In the event that a computer hacker is able to ascertain that 
algorithm and then use that information to remove the modifications from an audio 
5 signal, the algorithm can be changed by a content provider for subsequent audio 

signal processing. This would then make it necessary for the hacker to determine the 
new algorithm each time it is changed. Alternatively, many different algorithms can 
be alternately used by content providers in order to make the task of removing the 
modifications from the signal even more difficult. This notion can be taken one step 
1 0 further by using a different algorithm on different parts of the same song or other 

audio content. In addition to causing greater challenges for computer hackers in their 
efforts to compromise the beneficial effects of the audio processing begin disclosed, it 
will allow a single song to be tailored to the characteristics of multiple audio 
compression technologies and thus prevent this processed song from being 
compressed with quality by a large number of different compression encoder 
algorithms. 

Electronic Measure of Perceptibility 

Although it is the perception by ordinary human listeners of audio 
signals processed by the various techniques described above that is ultimately 
important, the perceptibility of the processing techniques can be measured by 
electronic means. In the examples of the first embodiment described above, the effect 
of anti-compression processing on an input audio signal before undergoing a 
compression step can be measured in this way. The anti-compressed processed signal 
is first passed through a series of bandpass filters in order to decompose this signal 
into the frequency components that comprise the processed audio signal. The input 
audio signal is also passed through a series of bandpass filters in order to decompose 
this signal into the frequency components that comprise the input audio signal. The 
unprocessed signal is subtracted from the anti-compressed processed signal to obtain 
the frequency components added to the input audio signal that comprise the added 
anti-compression signal. The added anti-compression signal is then compared, by use 
of a spectrum analyzer, with well known human hearing masking curves, which are 
used in all perceptual compression encoders, to determine the audibility of the applied 
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anti-compression signal as it appears in the anti-compressed version of the original 
audio signal. 

The effect of the processing in the examples of the second embodiment 
described above can also be measured by electronic techniques. The effect is a 
5 measure of anti-compression processing on a decompressed audio signal derived from 
an input audio signal that has undergone anti-compression processing and a 
compression encoding step. Discontinuities in the decompressed audio data stream 
are analyzed, where the decompressed audio data stream is derived from an input 
audio signal that has undergone anti-compression processing and a compression 
encoding step. The compressed audio data stream is frequency decomposed by using 
a series of bandpass filters. The average energy is measured, on a frequency bin 
basis, of the decompressed audio data stream under test. The deviations from these 
average energy values are then measured at the times at which anti-compression 
elements were added to the input, uncompressed, audio data stream. These energy 
variations are then electronically compared, on a frequency bin basis, with well 
known human masking curves, by means of an audio spectrum analyzer, to determine 
a measure of the audibility of the anti-compression signal included in the output 
decompressed signal. 

Video and Other Applications 

The techniques of processing digital signal files has been described 
above for use with audio signals. The protection of the transmission and sharing of 
audio content is currently a big concern, primarily because of the ease with which 
such content can be distributed over the Internet and on physical storage media. But 
the same approaches can also be applied to reduce the incentive to copy or transfer 
other types of data files, when that becomes desirable. Commercial movies and other 
video content is an example of content that can be similarly processed. Although the 
transmission of compressed video data files over the Internet and other 
communications networks is not now widespread because the bandwidth requirements 
exceed that available from the communications networks, this is likely to change in 
the future. 

Since most video, when in a digital form, is compressed, the 

techniques of the second embodiment described above for compressing audio data can 

also be used when compressing the video data. Although the compression and 
40 
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decompression algorithms are necessarily different, their characteristics are similar to 
those used with sound. A decompressed video signal, such as one obtained from a 
DVD disc, cannot be satisfactorily copied and again compressed since the 
decompressed video signal will have high levels of noise and distortion that makes the 
5 video unpleasant for a viewer to watch. This is especially the case when the video 
image repeatedly switches between a reasonably good image and a very poor image, 
or between two levels of poor images. 



Conclusion 

1 0 The present invention is fundamental to the processing of either 

original or compressed signals to make them unsuitable for any further compression. 
The invention is particularly suitable for use with signals that are interfaced with 
humans, such as audio, particularly music, and video signals, since the poor quality of 
unauthorized copies will not be tolerated by humans. Although the various aspects of 

1 5 the present invention have been described with respect to specific embodiments and 
examples thereof, it will be understood that the invention is entitled to protection 
within the full scope of the appended claims. 
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IT IS CLAIMED : 

1 . A method of processing a human interface signal, comprising 
modifying the interface signal in a manner minimizing the perceptibility of the 

5 modification when the interface signal is reproduced but which modifies the signal 
sufficiently so that a reduced quality is perceptible in a signal reproduced from a 
compressed version of the modified signal upon its decompression. 

2. The method of claim 1, wherein the interface signal is a audio 
1 0 signal and the reproduced signal is a sound signal. 

3 . The method of claim 2, wherein modifying the audio signal 
includes increasing levels of certain frequency components of the audio signal. 

15 4. The method of claim 2, wherein modifying the audio signal 

includes ascertaining spectral distributions of temporally successive blocks of data of 
the audio signal, determining masking functions for individual ones of the spectral 
distributions of data, an individual masking function defining upper levels of 
frequency components of its associated block of data to which perception of the signal 

20 does not change, and increasing the levels of at least some of the frequency 

components of the spectral distributions below their respective masking functions. 

5 . The method of claim 2, wherein the audio signal includes at 
least first and second channel signals , and wherein modifying the signal includes 

25 altering a relationship between said at least first and second channel signals. 

6. The method of claim 5, wherein altering relationships includes 
altering amplitude, timing or phase relationships between said at least first and second 
channel signals. 

30 

7. The method of claim 5, wherein modifying the audio signal 
additionally includes utilizing the relationship between said at least first and second 
channel signals to unmask components of the audio signal that are masked. 
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8. The method of claim 2, wherein modifying the audio signal 
further includes doing so in a manner which causes a sound data compression and 
decompression algorithm, when compressing the modified audio signal, to at least 
part of the time invoke at least one compression mode that is different from that 

5 which is invoked by the audio signal alone in order that the compressed version 
thereof results in a version of the audio signal that is perceptible upon its 
decompression to be undesirably changed. 

9. The method of claim 8, wherein modifying the audio signal 
further includes doing so in a manner which causes the compression and 
decompression algorithm to compress the modified audio signal by invoking said at 
least one algorithm compression mode that is alternately the same and different from 
that which is invoked by the original audio signal alone. 

10. The method of claim 8, wherein the audio signal includes two 
or more audio channels and the sound data compression and decompression algorithm 
includes at least two compression modes, a first mode wherein data of each of the two 
or more channels of the audio signal are compressed separately and a second mode 
wherein data of the audio signal of the two or more channels are combined together 
prior to compression. 

1 1 . The method of claim 2, wherein modifying the audio signal 
includes non-continuously removing at least one component from the audio signal. 

12. The method of claim 2, additionally comprising initially 
decompressing the audio signal from a compressed version thereof received over a 
communications network, the initial decompression and the modification of the 
decompressed audio signal being carried out in a processor unit that isolates the 
decompressed audio signal from a user prior to its modification. 

13. The method of any one of claims 1-12, additionally comprising 
recording the modified signal in a physical storage medium. 
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14. The method according to claim 1 , wherein modifying the signal 
additionally includes doing so in a manner that also minimizes the perceptibility of 
the modification when the signal is compressed and decompressed a first time but 
wherein said reduced quality is perceptible in the signal when reproduced from a 

5 decompression of the second compression of the signal. 

15. The method of claim 14, wherein the interface signal is a audio 
signal and the reproduced signal is a sound signal. 

10 16. The method of claim 1 5, wherein modifying the audio signal 

includes adding noise or audio data thereto. 

1 7. The method of claim 1 6, wherein the noise or audio data is 
added to the audio signal in recurring bursts. 

15 

18. The method according to any one of claims 14 - 17, 
additionally comprising recording the signal in a first compressed version thereon in a 
physical storage medium. 

20 19. A method of compressing a human interface signal, comprising 

modifying a process of its compression in a manner that minimizes the perceptibility 
of a resulting change to the signal when decompressed from said compression but 
which results in a second signal having a reduced quality when reproduced from a 
second compression and decompression of the decompressed audio signal. 

25 

20. The method of claim 19, wherein the interface signal is a audio 
signal and the second signal is a sound signal. 

21 . The method of claim 20, wherein modifying the compression 
30 process includes altering timing of processing of defined time sequential blocks of 

data of the audio signal. 
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22. The method of claim 20, wherein modifying the compression 
process includes doing so as a function of at least one characteristic of the audio 
signal. 

5 23. The method of claim 20, wherein modifying the compression 

process includes using a quantizer adjusted to quantize individual frequency 
components of the audio signal in a manner that avoids the perceptibility of 
quantizing errors in the audio signal when decompressed from said compression but 
which renders quantizing errors perceptible in a sound signal reproduced from the 
10 second compression and decompression of the decompressed audio signal. 

24. The method of claim 20, wherein modifying the compression 
process includes adding encoded discontinuities to data resulting from compression of 
the audio signal. 

15 

25. The method of claim 24, wherein the encoded discontinuities 
are characterized by invoking at least part of the time in a second compression at least 
one compression mode that is different from that which is invoked without the 
discontinuities. 

20 

26. The method of claim 25, wherein the encoded discontinuities 
are further characterized by intermittently invoking said at least one compression 
mode. 

25 27. The method of any one of claims 1 9-26, additionally 

comprising recording the compressed signal in a physical storage medium. 

28. An audio signal in a form allowing reproduction thereof, 
comprising audio content that has been modified in a manner minimizing the 
30 perceptibility of the modification when the audio signal is reproduced but which 
causes the audio content to have a reduced quality when the audio signal is 
compressed and decompressed. 
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29. The audio signal according to claim 28, wherein the 
modifications of the audio content include increased levels of certain frequency 
components of the audio content below making levels. 

30. The audio signal according to claim 28, wherein the 
modifications of the audio content are characterized by causing a sound compression 
and decompression algorithm to compress the audio signal at least part of the time by 
invoking at least one compression mode that is different than that which would be 
invoked by the audio content alone. 

3 1 . The audio signal according to claim 3 0, wherein the' 
modifications of the audio content are further characterized by causing the 
compression and decompression algorithm to intermittently invoke said at least one 
different compression mode. 

32. The audio signal according to claim 28, wherein the audio 
signal includes a single audio selection, title, song or portion thereof. 



33. The audio signal of any one of claims 28 - 32 stored on a 
20 physical storage medium. 



34. The audio signal of claim 33, wherein the physical storage 
medium is selected from a group consisting of a magnetic storage device including a 
computer disk or an audio tape cassette, an optical storage device including a 
25 Compact Disc or a Digital Video Disc, motion picture film and a non-volatile 
semiconductor memory card. 



35. A compressed version of an audio signal in a form allowing 
decompression and reproduction thereof, comprising a compressed version of audio 
30 content that has been modified in a manner minimizing the perceptibility of the 
modification when the audio signal is decompressed but which causes the audio 
content to have a reduced quality when the decompressed audio signal is compressed 
and decompressed for a second time. 
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36. The compressed audio signal according to claim 35, wherein 
the compressed audio signal is characterized by invoking at least part of the time in a 
second compression of the decompressed audio signal at least one compression mode 
that is different from that which is invoked without the modification to the audio 
5 content. 



37. The compressed audio signal according to claim 36, wherein 
the compressed audio signal is further characterized by intermittently invoking the 
different compression mode in a second compression. 

38. The audio signal according to claim 35, wherein the audio 
signal includes a single audio selection, title, song or part thereof. 

39. The audio' signal of any one of claims 35 -38 stored on a 
physical storage medium. 



40. The audio signal of claim 39, wherein the physical storage 
medium is selected from a group consisting of a magnetic storage device including a 
computer disk or an audio tape cassette, an optical storage device including a 
20 Compact Disc or a Digital Video Disc, motion picture film and a non- volatile 
semiconductor memory card. 



41 . A signal processing device, comprising a memory and a 
processor controlled to modify an encrypted compressed input audio content signal to 

25 produce an unencrypted decompressed output signal with modifications selected to 
not be perceived but which, if the output signal were to be compressed and then 
decompressed a second time, would generate a second decompressed signal of poor 
quality, the processor and memory being protected to prevent a user from having 
ready access to an unencrypted version of said signal without said modifications. 

30 

42. A signal processing device, comprising a memory and a 
processor controlled to unencrypt and decompress an encrypted compressed input 
audio signal that has been processed so that an unencrypted decompressed output 
signal therefrom carries modifications selected to not be perceived but which, if the 
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output signal were to be compressed and then decompressed a second time, would 
generate a second decompressed signal of poor quality, the processor and memory 
being protected to prevent a user from having ready access to an unencrypted version 
of said signal without said modifications. 

5 

43. The signal processing device of either one of claims 41 or 42, 
wherein the module is in the form of a card that is removably insertable into a sound 
reproducing device. 

10 44. A system for processing an input audio signal to generate a 

modified version thereof as an output audio signal, comprising: 

an analyzer receiving the input signal that deterrnines acoustic 
elements of the input signal, 

a function generator that receives the input signal acoustic elements 
and generates a function in response thereto that, when combined with the input 
signal, generates the output signal that is perceptively substantially the same as the 
input signal but which, when compressed and decompressed, would produce a sound 
signal that is perceptively significantly inferior to the input signal, and 

a combiner of the input signal and the function that provides the output 

audio signal. 

45. The system of claim 44, wherein the function generator 
includes a degradation function generator that modifies the input signal in a manner 
that the degradation would be perceptible in said sound signal. 

46. The system of claim 44, wherein the function generator 
includes a forcing function generator that would cause an algorithm compressing the 
output signal to operate in an incorrect mode at least part of the time. 

47. The system of any one of claims 45 or 46, wherein the function 
generator includes a masking function generator that operates in response to the 
acoustic elements of the input signal to reduce the perceptibility of the generated 
function in the output signal prior to any compression thereof. 
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48. An audio signal processing system, comprising: 

an audio data compressor that receives an input audio signal and 
generates a compressed version thereof as an output audio signal, 

an analyzer receiving the input signal that determines acoustic 
5 elements of the input signal, 

a function generator that receives the input signal acoustic elements 
and generates a function in response thereto that, when inserted into the data 
compressor, causes the output signal from the data compressor to allow a sound signal 
to be decompressed therefrom that is perceptively substantially the same as the input 
1 0 signal but which, when compressed and decompressed a second time, would produce 
a second sound signal that is perceptively significantly inferior to the input signal, and 

an inserter of the function into the data compressor. 

49. The system of claim 48, wherein the function generator 

1 5 includes a degradation function generator that modifies the input signal in a manner 
that the degradation would be perceptible in said second sound signal. 

50. The system of claim 48, wherein the function generator 
includes a forcing function generator that would cause an algorithm compressing the 

20 output signal a second time to operate in an incorrect mode at least part of the time. 

5 1 . The system of any one of claims 49 or 50, wherein the function 
generator includes a masking function generator that operates in response to the 
acoustic elements of the input signal to reduce the perceptibility of the generated 

25 function in the sound signal. 
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